The large number of duplicate images in the database not only affects the performance of the learner, but also consumes a lot of storage space. For massive image deduplication, a duplicate detection algorithm for massive images was proposed based on pHash (perception Hashing). Firstly, the pHash values of all images were generated. Secondly, the pHash values were divided into several parts with the same length. If the values of one of the pHash parts of the two images were equal to each other, the two images might be duplicate. Finally, the transitivity of image duplicate was discussed, and corresponding algorithms for transitivity case and non-transitivity case were proposed. Experimental results show that the proposed algorithms are effective in processing massive images. When the similarity threshold is 13, detecting the duplicate of nearly 300000 images by the proposed transitive algorithm only takes about two minutes with the accuracy around 53%.
Morphological reconstruction is a fundamental and critical operation in medical image processing, in which dilation operations are repeatedly carried out on the marker image based on the characteristics of mask image, until no change occurs on the pixels of the marker image. Concerning the problem that traditional CPU-based morphological reconstruction system has low computational efficiency, using Graphics Processing Unit (GPU) to quicken the morphological reconstruction was proposed. Firstly, a GPU-friendly data structure:parallel heap cluster was proposed. Then, based on the parallel heap cluster, a GPU-based morphological reconstruction system was designed and implemented. The experimental results show that compared with traditional CPU-based morphological reconstruction system, the proposed GPU-based morphological reconstruction system can achieve speedup ratio over 20 times. The proposed system demonstrates how to efficiently port complex data structure-based software system onto GPU.
Nowadays Key-Value store system is widely used in various Internet services. However, the existing Key-Value store systems, mostly run in user-mode, can not meet the demands of high-concurrency and low-latency. It is mainly because user-mode usually provides inefficient access interfaces and transaction processing due to mode switch or context switch. To solve these problems, an in-kernel implementation of Key-Value store system, called KStore, was proposed in this paper. It had an in-kernel index and an in-kernel memory allocator, which were used to manage Key-Value data efficiently. To guarantee the low-latency response, KStore provided a remote interface based on in-kernel Socket, and a local interface based on file system. In addition, KStore processed concurrent requests with a novel mechanism based on in-kernel multi-thread. The experimental results show that KStore gains a remarkable advantage over Memcached in the characteristics of real-time and concurrency.
Concerning the deficiency in scalability of the traditional hierarchical clustering algorithm when dealing with large-scale text, a parallel hierarchical clustering algorithm based on the MapReduce programming model was proposed. The vertical data partitioning algorithm based on the statistical characteristic of the components group of text vector was developed for data partitioning in MapReduce. Additionally, the sorting characteristics of the MapReduce were applied to select the merge points, making the algorithm be more efficient and conducive to improve clustering accuracy. The experimental results show that the proposed algorithm is effective and has good scalability.
To improve the effect of traditional eigenface method on face recognition under large illumination variation, a new face recognition method was proposed. Unlike second -order PCA face recognition, it used independent component analysis on the PCA residual eigenfaces instead of principal component analysis to extract the independent component feature, and integrated the IC feature in PCA residual face space with the IC feature in original face space to be the ultimate feature for recognition. Experiments prove that it is more efficient than some conventional human face recognition methods, such as eigenface based method, ICA based method, and second-order PCA method, under large illumination and pose variations, and also has a good practicability.